Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(rax_tree): Fix crash caused by element destruction in RaxTreeMap #4228

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

BagritsevichStepan
Copy link
Contributor

fixes #4172

This is the simplest solution. However, I propose a different solution that will invoke raxFreeWithCallback. The problem is that this will require a capturing lambda, as we need to pass alloc_. Therefore, we need to copy and update the raxFreeWithCallback method for capturing lambdas.

Additionally, I believe that the SeekIterator should be removed, as it seems the issue was caused by it

@BagritsevichStepan
Copy link
Contributor Author

https://github.com/dragonflydb/dragonfly/actions/runs/12082557873

@romange
Copy link
Collaborator

romange commented Nov 29, 2024

Can you please explain what the problem is in the PR description?
i.e. the cause of crash is ..., this solution solves the problem by .... , but i suggest the following solution ...

@romange romange self-requested a review November 29, 2024 12:43
V* ptr = &(*it).second;
std::allocator_traits<decltype(alloc_)>::destroy(alloc_, ptr);
alloc_.deallocate(ptr, 1);
while (true) {
Copy link
Contributor

@kostasrim kostasrim Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two things that concern me here:

  1. Looking at how raxRecursiveFree is implemented it calls free_callback(raxGetData(n) and what I introduced before seems to be on par in that respect because V* ptr = &(*it).second; is actually the result of raxGetData(n).
  2. Since we can reproduce this we should also include an isolated test to assert the root cause of the issue (my mistake not to suggest this earlier)

I would like to understand why deleting the result of raxGetData(n) result in segfault.

P.s. I also checked what Valkey is doing and indeed they use free_callback (they don't iterate like we do) but let's make sure we understand this

Copy link
Contributor Author

@BagritsevichStepan BagritsevichStepan Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raxFree passes NULL callback to the raxResursiveFree

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's not called there

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BagritsevichStepan do you know how to reproduce this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dragonfly crashes during RaxTreeMap::try_emplace
3 participants